At a time when there is much advocacy for environmental issues, rapid progress has been made in terms of achieving paperless systems in offices. One example proposed is an apparatus that uses a scanner to read paper documents or distribution material collected in a binder or the like and retrieve an electronic document of the original (e.g., see the specification of Japanese Patent No. 3017851).
However, since the above-described retrieval apparatus applies a technique based upon comparison with a bitmap to all images uniformly, retrieval efficiency and retrieval accuracy may decline depending upon the content of the document image. Accordingly, the applicant has given consideration to a document management system that employs a retrieval technique (referred to as “compound retrieval”) in which when an original document is to be retrieved, first the feature of every attribute such as text, photograph or line art is calculated from the original document and the image that has been scanned in by the scanner, and then a plurality of degrees of matching are judged comprehensively, examples being degree of text matching, degree of photographic image matching, and degree of matching between layouts using layout information of each attribute.
The documents handled in such a document management system are multifarious and extend from documents having many text attributes to documents having many photograph and line-art attributes, and the layout (document content) differs greatly from document to document. With the above-described document management system, retrieval results for every attribute are evaluated uniformly when the comprehensive judgment is made. As a consequence, a problem which arises is that a satisfactory retrieval accuracy is not obtained in an environment that has a mixture of documents of widely different layouts. Further, in order to deal with redundant documents and enhance recycling of documents, not only is it necessary to make a comprehensive judgment concerning all documents but it is also required to conduct a search with regard to documents in which only portions thereof match. The need to conduct such a search for partial matching is pronounced even in situations where reference is made to photographs. For example, such a search is essential in a case where different authors are involved in the same document, such as an instance where it is necessary to collect a royalty suitably for separate portions of the document.